Skip to content

QVAC-20984 feat: add analytic gradchecked backward pass for the CAMPPlus speaker encoder#61

Merged
GustavoA1604 merged 2 commits into
masterfrom
QVAC-20984/ggml-backward-campplus
Jun 22, 2026
Merged

QVAC-20984 feat: add analytic gradchecked backward pass for the CAMPPlus speaker encoder#61
GustavoA1604 merged 2 commits into
masterfrom
QVAC-20984/ggml-backward-campplus

Conversation

@freddy311082

Copy link
Copy Markdown

What

Makes the CAMPPlus speaker encoder differentiable for voice-clone enrollment by
adding an analytic, model-free C++ backward pass that returns d(loss)/d(fbank),
validated against the Task 2 finite-difference gradcheck harness. In the
enrollment loop CAMPPlus provides the speaker-similarity loss; the target-WAV
embedding stays forward-only (constant) and only the generated-audio path needs
gradients, so the gradient is the input gradient with the model weights frozen.

Follows the same pattern as the sibling tickets already on master
(#55 text-encoder tail / QVAC-20978, #58 vector estimator / QVAC-20982,
#60 vocoder / QVAC-20983): a pure double reference backward, gradchecked
component-wise, with the op×backend gap documented. Dependencies: Task 2
(QVAC-20979).

Forward-parity anchor

A gradcheck alone is self-referential: it only proves the backward is the exact
derivative of its own forward. To tie that to the real model, a second test
asserts the analytic double forward matches the production scalar forward
(campplus_embed_cpu) on synthetic weights (max_abs ≈ 3e-8, i.e. float-vs-double
rounding only). Building it surfaced that campplus_embed_cpu's fcm_forward
hardcodes the input feature dim to 80, so the production CPU path is only
self-consistent at feat_dim=80 (the parity test uses that). The analytic
backward derives every dimension from feat_dim, so it is geometry-agnostic.

Changes

  • src/campplus_backward.{h,cpp} — new CampplusBackward class (namespace
    cp_grad). Owns the frozen weights and caches per-call activations as state;
    public surface is forward(fbank) / backward(d_emb). Channel-major (C, T)
    layout mirroring campplus_embed_cpu exactly. Implements the CAMPPlus
    primitives and their input-gradients: stride/pad/dilation-aware conv1d/conv2d,
    pre-fused affine batch norm, ReLU, sigmoid, time-mean, segment pooling,
    statistics pooling (mean + unbiased std), the FCM Conv2d residual block (with
    optional shortcut) and the CAMDenseTDNN layer (context-attention gate + dense
    concat split).
  • test/test_campplus_backward.cpp — gradchecks every primitive, the FCM
    residual block, the CAM dense-TDNN layer and the full chain (12 checks) against
    central finite differences via the Task 2 harness. Always-on unit ctest tier
    (no model/fixtures, no-skip policy).
  • test/test_campplus_backward_parity.cpp — forward parity vs the production
    campplus_embed_cpu (see above). Also unit tier.
  • docs/voiceclone-backward-campplus.md — op×backend gap matrix and
    CPU-fallback rationale for enrollment.
  • CMakeLists.txt — register the test-campplus-backward and
    test-campplus-backward-parity targets.

CPU fallback (documented)

SIGMOID, SQRT, MEAN, SUM_ROWS, PAD, REPEAT and CONCAT have no
backward in the vendored ggml, so the enrollment backward cannot use ggml
autodiff on any backend. It is provided as the analytic C++ backward and runs on
CPU (enrollment is offline; the realtime synthesis GPU fast paths are untouched).
See the doc for the full matrix.

Acceptance

Gradcheck green: test-campplus-backward and test-campplus-backward-parity
both pass (2/2 in the unit tier).

…oder

Make CAMPPlus differentiable for the voice-clone enrollment loop: an analytic
C++ backward returning d(loss)/d(fbank) with frozen weights (target-WAV
embedding stays forward-only). Mirrors campplus_embed_cpu in channel-major
layout. Covers FCM (Conv2d + residual blocks), TDNN, CAMDenseTDNN blocks
(context-attention gate + dense concat), stats pooling and the dense head.

Tests (always-on unit tier, model-free):
- test-campplus-backward: gradcheck every primitive + full chain vs central
  finite differences (Task 2 harness).
- test-campplus-backward-parity: analytic double forward vs production
  campplus_embed_cpu on synthetic weights.

QVAC-20984
@freddy311082 freddy311082 requested review from a team as code owners June 19, 2026 22:59
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Status

Current Status: ❌ PENDING
Approvals so far: none

Pending reviews: Needs 1 Management or Team Lead, and 1 more from Management, Team Lead, or Member.

Address PR #61 review notes (non-blocking):

- Parity test now builds CAM blocks with num_layers 2/3/2 (was 1/1/1) so the
  dense-concat accumulation (layer i enters with C_in + i*growth) is anchored to
  the production forward, not only to the self-referential full-chain gradcheck.
  Parity stays green (max_abs ~4.6e-08, max_rel ~8.9e-08).
- Document the trust chain in the parity test header and the gap-matrix doc:
  every campplus_embed caller in the repo (main.cpp, test-campplus,
  test-voice-embedding) uses the scalar CPU forward, which is validated against
  the Python reference; campplus_embed_ggml is not wired to any caller yet.
@GustavoA1604 GustavoA1604 merged commit f4208d2 into master Jun 22, 2026
70 of 75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants